This dataset include a chemechial properties of white wine which influnce its quality. the dataset contains 13 variables with 4898 observation. All variables are numerical values where only quality is discrete and other values are continues.
## [1] 4898 13
## 'data.frame': 4898 obs. of 13 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7 6.3 8.1 7.2 7.2 8.1 6.2 7 6.3 8.1 ...
## $ volatile.acidity : num 0.27 0.3 0.28 0.23 0.23 0.28 0.32 0.27 0.3 0.22 ...
## $ citric.acid : num 0.36 0.34 0.4 0.32 0.32 0.4 0.16 0.36 0.34 0.43 ...
## $ residual.sugar : num 20.7 1.6 6.9 8.5 8.5 6.9 7 20.7 1.6 1.5 ...
## $ chlorides : num 0.045 0.049 0.05 0.058 0.058 0.05 0.045 0.045 0.049 0.044 ...
## $ free.sulfur.dioxide : num 45 14 30 47 47 30 30 45 14 28 ...
## $ total.sulfur.dioxide: num 170 132 97 186 186 97 136 170 132 129 ...
## $ density : num 1.001 0.994 0.995 0.996 0.996 ...
## $ pH : num 3 3.3 3.26 3.19 3.19 3.26 3.18 3 3.3 3.22 ...
## $ sulphates : num 0.45 0.49 0.44 0.4 0.4 0.44 0.47 0.45 0.49 0.45 ...
## $ alcohol : num 8.8 9.5 10.1 9.9 9.9 10.1 9.6 8.8 9.5 11 ...
## $ quality : int 6 6 6 6 6 6 6 6 6 6 ...
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1 Min. : 3.800 Min. :0.0800 Min. :0.0000
## 1st Qu.:1225 1st Qu.: 6.300 1st Qu.:0.2100 1st Qu.:0.2700
## Median :2450 Median : 6.800 Median :0.2600 Median :0.3200
## Mean :2450 Mean : 6.855 Mean :0.2782 Mean :0.3342
## 3rd Qu.:3674 3rd Qu.: 7.300 3rd Qu.:0.3200 3rd Qu.:0.3900
## Max. :4898 Max. :14.200 Max. :1.1000 Max. :1.6600
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.600 Min. :0.00900 Min. : 2.00
## 1st Qu.: 1.700 1st Qu.:0.03600 1st Qu.: 23.00
## Median : 5.200 Median :0.04300 Median : 34.00
## Mean : 6.391 Mean :0.04577 Mean : 35.31
## 3rd Qu.: 9.900 3rd Qu.:0.05000 3rd Qu.: 46.00
## Max. :65.800 Max. :0.34600 Max. :289.00
## total.sulfur.dioxide density pH sulphates
## Min. : 9.0 Min. :0.9871 Min. :2.720 Min. :0.2200
## 1st Qu.:108.0 1st Qu.:0.9917 1st Qu.:3.090 1st Qu.:0.4100
## Median :134.0 Median :0.9937 Median :3.180 Median :0.4700
## Mean :138.4 Mean :0.9940 Mean :3.188 Mean :0.4898
## 3rd Qu.:167.0 3rd Qu.:0.9961 3rd Qu.:3.280 3rd Qu.:0.5500
## Max. :440.0 Max. :1.0390 Max. :3.820 Max. :1.0800
## alcohol quality
## Min. : 8.00 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.40 Median :6.000
## Mean :10.51 Mean :5.878
## 3rd Qu.:11.40 3rd Qu.:6.000
## Max. :14.20 Max. :9.000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.800 6.300 6.800 6.855 7.300 14.200
The median of fixed acity is 6.8. the distrbution of fixed acity is slitly right sekwed. there are spme ouliers in the range >11.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0800 0.2100 0.2600 0.2782 0.3200 1.1000
the median value of volatile acidity is 0.26 the distribution right skwed with right tail and one peak. there are outlires when value > 0.9.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2700 0.3200 0.3342 0.3900 1.6600
the distribution of Citric Acid tend to be normal around its main peak but it has long right tail and one outlire when ~ 0.9. the median value is 0.32.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.600 1.700 5.200 6.391 9.900 65.800
The distribution of Residual Sugar is extremly right skewed. the median is 5.2 while the max is 65. there is no outlier.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00900 0.03600 0.04300 0.04577 0.05000 0.34600
The distribution of Chlorides looks normal around its main peak but has a very long right tail. the median value is 0.43.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 23.00 34.00 35.31 46.00 289.00
The distribution of Free Sulfur Dioxide is right skewed and concentrated around 34 (median). There are a few outliers in the right side of the plot.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.0 108.0 134.0 138.4 167.0 440.0
The distribution of Total Sulfur Dioxide is right skewed whith outliers in highr range > 300.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9871 0.9917 0.9937 0.9940 0.9961 1.0390
The distribution of Density is right skewed and concentrated around 0.99 (median). the plot has some outlier at 1.01.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.720 3.090 3.180 3.188 3.280 3.820
The Distribution of pH is unimodel and normal. the median is 3.18 and 1st Qu. is 3.09 and 3rd Qu. is 3.28.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2200 0.4100 0.4700 0.4898 0.5500 1.0800
The distribution of Sulphates is non-symmetric and has bimodal behavior. its slightly right skewed with right tail. the median is 0.47 and mean is 0.48.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 9.50 10.40 10.51 11.40 14.20
The distribution of alcohol is right skewed with some ups and downs. it has bimodal behavior, we can see 3 peaks at ~9, ~11 and ~12.5.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.878 6.000 9.000
The values are not continues. the distributaion is normal with one peak in the middle the 1st Qu. is 5 and 3rd Qu. is 6 the distance from min to median is 3 and distance from median to max is 3 too. no outliers.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.878 6.000 9.000
to improve the graph I will make breaks for the x scale and change the bin width.
More than 2300 of white wines have guality grade equal to 6 and around 1500 of white wines have quality grade equal 5 which means that almost 2800 of white wines have a good qaulity rate. Where around 1250 of whietwines have excelent quality rate (bigger tahan 6) and less than 250 of white wine have bad quality rate.Overall white wine have a good quality.
the dataset contains 12 variable and 4898 observation. 11 variables are related to the Chemical composition of white wine and 1 variable was related to the final result of this composition and factoring which is quality of wine.
Quality I want to know which component make raise the quality of white wine.
Alcohol, I think alcohol is the most component contributes on the quality of wine.
No, I did not. ### Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?
The possible values for quality are from 0-10 where in our dataset of white wine we have values from 3-9 which means either there is no extremly bad white wine or there is no data of bad wihte wine. also it means the white wine tend to be more good than bad since max is 9 and min is 3.
## [1] "X" "fixed.acidity" "volatile.acidity"
## [4] "citric.acid" "residual.sugar" "chlorides"
## [7] "free.sulfur.dioxide" "total.sulfur.dioxide" "density"
## [10] "pH" "sulphates" "alcohol"
## [13] "quality"
its clear that when alcoho increase thequality of wine is increase
from the above graph the lower density means higher quality
A very weak relation, higher quality grade lower Residual sugar
From the above graph, lower chlorides means higher quality of wine.
A strong Relation between alcohol and density, when the the density decrease alcohol increase.
there is avery strong relation when the density increase the residual sugar increase.
When the Fixed Acidity increase the PH decrease
Tip: As before, summarize what you found in your bivariate explorations here. Use the questions below to guide your discussion.
investigation. How did the feature(s) of interest vary with other features in
the dataset?
From the above correlation We can see the most correlation with quality are with Alcohol, density, cholorides, and a week relation with residual sugar. Also there is correlation between PH and fixed acidity and between density and alcohol.
Yes, A positive correlation between density and residual sugar , and negitive correlation between alcohol and density
the strongest rerationship that i found was between density and residual sugar which had correlation with more than 0.8
The quality of white wine is high when the alcohol is high and the dinsity is low. Don’t see much effect for residual sugar.
Don’t see much variation on chlorides.
when the total sulfur dioxide is low and alcohol is high the quality is high.
Too many outlier, no impact for volatile acidity on the quality.
the most observed relation is between alcoho ana quality. also there is a inverse relation (but not high effected) with quality and density and between quality and total sulfur dioxide.
The intersting part is that there is only one variable shows clear impact on white wine quality.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.878 6.000 9.000
This is the bar Chart for quality variable, the chart shows that value of quality between 1 to 9. maximum number of white wine have quality grade 6 and the minimum has grade 9.
The above plotshows that white wine with high density have a low concentration of alcohol, a negative correlation between density and alcoho.
The above graph shows that when the density is decrease and alcohol increase the quality increase. positive corelation between alcohol and quality while a negitave correlation between deinstity and alcohol and density and quality.
The most challenges forme was understanding the data since i have no idea no background about wines so this was like a weakness of this project. I was surprised also as i did not find any strong factor for white wine other than alcohol.